Preparation of MaDiTS corpus for Malay dialect translation and speech synthesis system

نویسندگان

  • Yen-Min Jasmina Khaw
  • Tien Ping Tan
چکیده

This paper presents our work in acquiring a Malay dialect translation and speech synthesis corpus. In this study, an architecture of speech corpus acquisition, which including Malay dialect translation and Malay dialect grapheme to phoneme (G2P), was proposed. The pronunciation dictionary for dialectal Malay was generated through G2P tool. As dialectal Malay is considered as scarce resource, dialectal translation rules were developed for translating standard Malay text into dialectal Malay. With this, Kelantanese Malay is chosen in this research as it is considered as one of the Malay dialect from Kelantan, which positioned in the northeast of Peninsular Malaysia. This dialect is very distinctive. Evaluation results showed that the selected sentences through proposed approach has a correlation coefficient of about 0.99, which mean that it is phonetically well balanced.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel Speech Corpora of Japanese Dialects

Clean speech data is necessary for spoken language processing, however, there is no public Japanese dialect corpus collected for speech processing. Parallel speech corpora of dialect are also important because real dialect affects each other, however, the existing data only includes noisy speech data of dialects and their translation in common language. In this paper, we collected parallel spee...

متن کامل

Corpus Design for Malay Corpus-based Speech Synthesis System

Problem statement: Speech corpus is one of the major components in corpus-based synthesis. The quality and coverage in speech corpus will affect the quality of synthesis speech sound. Approach: This study proposes a corpus design for Malay corpus-based speech synthesis system. This includes the study of design criteria in corpus-based speech synthesis, Malay corpus based database design and the...

متن کامل

Globalization, Standardization, and Dialect Leveling in Iran

This paper is an attempt to shed light on the effects of modernization, urbanization, monolingual educational system, and mass media as well as the process of globalization on dialect leveling among Persian dialects. In so doing, the first part of the paper elaborates on the relationship between globalization and sociolinguistics, and on the concept of standardization. Also, it discusses some ...

متن کامل

An Overview of BPPT's Indonesian Language Resources

This paper describes various Indonesian language resources that Agency for the Assessment and Application of Technology (BPPT) has developed and collected since mid 80’s when we joined MMTS (Multilingual Machine Translation System), an international project coordinated by CICCJapan to develop a machine translation system for five Asian languages (Bahasa Indonesia, Malay, Thai, Japanese, and Chi...

متن کامل

Grapheme to phoneme conversion: an Arabic dialect case

We aim to develop a Speech-to-Speech translation system between Modern Standard Arabic and Algiers dialect. Such a system must include a Text-to-Speech module which itself must include a Grapheme-to-Phoneme converter. Algiers dialect is an Arabic dialect concerned by the most problems of Modern Standard Arabic in NLP area. Furthermore, it could be considered as an under-resourced language becau...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014